Skip to content

Conversation

ldematte
Copy link
Contributor

@ldematte ldematte commented Oct 10, 2025

This PR changes how we gather and compact vector data for transmitting them to the GPU. Instead of using a temporary file to write out the compacted arrays, we use directly the vector values from the scorer supplier, which are backed by a memory mapped input. This way we avoid an additional copy of the data.

@ldematte ldematte requested a review from a team as a code owner October 10, 2025 15:14
@ldematte ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.2.1 v9.3.0 labels Oct 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 10, 2025
@ldematte ldematte changed the title [Gpu] Optimize merge memory usage [GPU] Optimize merge memory usage Oct 10, 2025
"-Dio.netty.noUnsafe=true",
"-Dio.netty.noKeySetOptimization=true",
"-Dio.netty.recycler.maxCapacityPerThread=0",
// temporary until we get access to raw vectors in a future Lucene version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an open Lucene issue or PR for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet; depending on how #136416 goes, and the opinion of people more expert in Lucene (you, Chris, Ben), I'd like to generalize what we did there and raise a Lucene issue to have it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I honestly don't see why Lucene would ever expose this information. It expands an API for no good purpose within Lucene.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not this API as-is, but I'd think there is value in having the ability to access back in a convenient and efficient way what has been written so far; it avoids having to write the same data more than once, or keep copies on memory, when we need the original data (e.g. raw vectors) to add "something" on top of it (e.g. quantized vectors, graph, etc.).
(But maybe I'm naive)

import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;

public class VectorsFormatReflectionUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, very nice organization! +1 for using VarHandle for reflection

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldematte Great work, I have not tested it yet, but amazing work how you organized it. My main comment: do you think we can simplify this PR by breaking into two separate ones: making this PR only about changes to merges, and doing changes for flush, ResourcesHolder, 128Mb in a separate PR? Or these changes are tightly coupled?

@ldematte
Copy link
Contributor Author

doing changes for flush, ResourcesHolder, 128Mb in a separate PR?

I can do that: here is the PR #136464

@mayya-sharipova
Copy link
Contributor

@ldematte Great changes. I have done some benchmarking on my laptop with int8, and I see great recall but surprisingly no speedups as compared with main branch:

gist: 1_000_000 docs; 960 dims; euclidean metric

index_type index_time (ms) force_merge_time (ms) QPS single segment recall
gpu main 61422 69010 353 0.97
gpu PR 59035 67766 296 0.98

cohere-wikipedia_v2: 934_024 docs; 768 dims; cosine metric

index_type index_time (ms) force_merge_time (ms) QPS single segment recall
gpu main 48164 47657 384 0.99
gpu PR 47824 47354 393 0.99

// problems with strides; the explicit copy removes the stride while copying.
// Note that this is _not_ an additional copy: input data needs to be moved to GPU memory anyway,
// we are just doing it explicitly instead of relying on CagraIndex#build to do it.
var deviceDataSet = dataset.toDevice(resourcesHolder.resources())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice workaround, so you also confirmed that strides don't work properly with Cagra index implementation?

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @ldematte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-gpu Run tests using a GPU v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants